Cohesion and Repulsion in Bayesian Distance Clustering

نویسندگان

چکیده

Clustering in high-dimensions poses many statistical challenges. While traditional distance-based clustering methods are computationally feasible, they lack probabilistic interpretation and rely on heuristics for estimation of the number clusters. On other hand, model-based techniques often fail to scale devising algorithms that able effectively explore posterior space is an open problem. Based recent developments Bayesian clustering, we propose a hybrid solution entails defining likelihood pairwise distances between observations. The novelty approach consists including both cohesion repulsion terms likelihood, which allows cluster identifiability. This implies clusters composed objects have small "dissimilarities" among themselves (cohesion) similar dissimilarities observations (repulsion). We show how this modelling strategy has interesting connection with existing proposals literature as well decision-theoretic interpretation. proposed method efficient applicable wide variety scenarios. demonstrate simulation study application digital numismatics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

cohesion and cohesive devices in a contrastive analysis between ge and esp texts

the present study was an attempt to conduct a contrastive analysis between general english (ge) and english for specific purposes (esp) texts in terms of cohesion and cohesive devices. to this end, thirty texts from different esp and ge textbooks were randomly selected. then they were analyzed manually to find the frequency of cohesive devices. cohesive devices include reference, substitution, ...

15 صفحه اول

the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance

با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...

ahp algorithm and un-supervised clustering in auto insurance fraud detection

this thesis is a study on insurance fraud in iran automobile insurance industry and explores the usage of expert linkage between un-supervised clustering and analytical hierarchy process(ahp), and renders the findings from applying these algorithms for automobile insurance claim fraud detection. the expert linkage determination objective function plan provides us with a way to determine whi...

15 صفحه اول

Reconciliation of Unsupervised Clustering, Segmentation and Cohesion

This extended abstract examines the progress of a project on unsupervised language learning, and focuses on two different approaches to segmentation, as well as how cohesion may be generalized from it definitive morphosyntactic instantiation. It is intended as a discussion paper, and outlines the specific hypotheses currenlty being tested.

متن کامل

An Incremental Text Segmentation by Clustering Cohesion

This paper describes a new method, called IClustSeg, for linear text segmentation by topic using an incremental overlapped clustering algorithm. Incremental algorithms are able to process new objects as they are added to the collection and, according to the changes, to update the results using previous information. In our approach, we maintain a structure to get an incremental overlapped cluste...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of the American Statistical Association

سال: 2023

ISSN: ['0162-1459', '1537-274X', '2326-6228', '1522-5445']

DOI: https://doi.org/10.1080/01621459.2023.2191821